22_hashing_efficiency_and_load

Defining the Load Factor ($\lambda$)

The Load Factor ($\lambda$) is the central metric determining the real-world efficiency of a hash table. It quantifies how full the table is, directly correlating to the probability and length of collisions.

$$ \lambda = \frac{\text{Number of Items Stored } (N)}{\text{Size of Table/Buckets } (M)} $$

Separate Chaining: $\lambda$ represents the average length of the linked lists (chains).
Open Addressing (Probing): $\lambda$ represents the table saturation, directly correlating to the expected number of probes required to find an empty slot or a specific item.

Load Factor & Complexity

To achieve the theoretical $O(1)$ average case performance for Search, Insert, and Delete operations, the Load Factor ($\lambda$) must be constant and generally small (e.g., $< 1.0$ or $< 0.75$).

Performance Factor	Average Case (Target $\lambda$)	Worst Case (High $\lambda$)
Search/Insert Time	$O(1 + \lambda) \approx O(1)$	$O(N)$ (Approaches linear search)
Collision Rate	Low/Manageable	High
Space Overhead	$O(N + M)$	$O(N + M)$

Key Implication: If $M$ is fixed and $N$ grows, $\lambda$ increases, degrading performance toward $O(N)$.

Strategy: Managing Saturation (Re-hashing)

To prevent performance degradation caused by high $\lambda$, efficient hash tables implement Re-hashing (resizing).

Threshold Monitoring: The system monitors $\lambda$. If it exceeds a predefined threshold (e.g., 0.75 for Open Addressing, or 2.0 for Chaining), resizing is triggered.
Expansion: The table size $M$ is increased (e.g., doubled to $M'$).
Redistribution: All existing $N$ items are re-inserted into the new, larger table $M'$, using a potentially new hash function (mod $M'$).

Result: $\lambda$ immediately drops, restoring the structure's $O(1)$ average performance guarantee. Note that the re-hashing operation itself temporarily costs $O(N)$.

📝 Interactive Quiz

1. What is the primary purpose of re-hashing a hash table?

A) To sort the elements within the table.
B) To reduce the load factor and maintain $O(1)$ average performance.
C) To immediately free up unused memory slots.
D) To check for data corruption.

2. In a hash table using Separate Chaining, if $N=20$ items are stored in $M=10$ buckets, what does the load factor $\lambda=2$ represent?

A) The table is completely full and cannot accept new items.
B) The worst-case search time is exactly 2 comparisons.
C) The average length of the linked lists (chains) is 2.
D) At least one bucket must contain exactly 2 items.

3. A system uses an Open Addressing hash table with $M=10,000$ slots. It stores $N=1,500$ items. What is the expected average time complexity for a search?

A) $O(1)$
B) $O(\log N)$
C) $O(N)$
D) $O(\lambda)$

4. Although a single re-hash operation costs $O(N)$, why is the amortized cost of insertion still considered $O(1)$?

A) The high cost is infrequent and spread out over many cheap insertions.
B) Re-hashing only happens when the table is empty.
C) The $O(N)$ cost is a theoretical maximum that never actually occurs.
D) Modern CPUs can perform the re-hash in a single clock cycle.

KEY TIPS

💡 Key Tips

Practical Thresholds

For Open Addressing, re-hashing is often triggered when $\lambda$ hits 0.5 to 0.75. For Separate Chaining, thresholds can be higher, like 1.0 or 2.0.

Amortized Analysis

While a single re-hash costs $O(N)$, this cost is spread out over many cheap $O(1)$ insertions. The average, or "amortized", cost of an insertion remains $O(1)$.

Prime Table Sizes

Using a prime number for the table size $M$ helps distribute keys more uniformly, reducing collisions, especially when using a simple hash function like key % M.